-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Enumerated Data Types #4051
Conversation
@KiterLuc Could you take a look at the enumeration loading logic here: (Link is to the changes in I'm pretty sure this will run on the cloud side of a remote query so it should work? I'm mostly not sure how this would interact when we have to submit a query multiple times before completion. While I think it should work, I'm also not sure there's not a better place for that logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial feedback while scanning to load everything into memory... I'll do another pass focussing more on storage format on the next revision.
test/src/unit-cppapi-enumerations.cc
Outdated
TEST_CASE("C++ API: Enumeration creation fixed size", "[cppapi][enumeration]") { | ||
TestData td; | ||
|
||
std::vector<uint32_t> values = {1, 2, 3, 4, 5}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I really like how all test cases have these 'paragraph'. Would it be possible to add a one liner comment to each so that it's easy to scan the file and see what each test case intends on doing without reading the whole code? Later down the line it makes it easier to maintain the tests or scan for missing coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests will probably change drastically. I just did the first thing that came to mind to test while I was developing. I like to write the implementation and then come back later with my brain in testing mode to write out a comprehensive test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've massively rewritten this test suite so its completely different so I'm going to let you check and see if its documented enough to your liking. The bulk of these tests are moved to unit-enumerations.cc
which uses the internal C++ APIs (as opposed to the API in tiledb/sm/cpp_api/
). All of the tests are short and have descriptive names that describe what's being tested.
c15874e
to
a8f12ea
Compare
5ae0fcf
to
4928106
Compare
@KiterLuc It took me longer to address you're earlier feed back about all the magic number conversions in query_ast.cc so I didn't get nearly as far as I was planning today. I've also not gone through and address all of your earlier comments around things like documentation, however I'm pretty sure all of the magic numbers and things like UINT32_MAX instead of TILEDB_VAR_NUM have all been fixed (though I haven't audited to be absolutely sure). Other than that, there's a couple missing internal APIs exposed to the C and CPP APIs. Re-reading the storage stuff and then going through |
Also, here's my current coverage report as of a few seconds before posting this. N.B. Gists truncate after the first 1M bytes so the last few diff's aren't included. https://gistpreview.github.io/?95c129f102d34cda4f38e7ff9331b84e |
4928106
to
16ef8d2
Compare
32a8576
to
4c08bb3
Compare
format_spec/enumeration.md
Outdated
|
||
| **Field** | **Type** | **Description** | | ||
| :--- | :--- | :--- | | ||
| Version number | `uint32_t` | Format version number of the generic tile | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate? The format version of the generic tile should be in the generic tile header... So I'm not sure why it would be included here again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was implemented quite poorly before by attempting to re-use the format_version as the enumerations version. Since then I've split the naming and versioning so that things are more clear.
You were quite right that the format version is already part of the GenericTileIO header so if we really needed it, we could have gotten it there.
This however is a version for the Enumeration data itself. It now starts at 0, and if we ever have to change this (de)?serialization implementation for the contents of the GenericTileIO "payload" we can do that by using this Enumerations specific version.
7937d8f
to
534d4a2
Compare
@teo-tsirpanis You mean just drop something into the examples directory to show how it works? |
@davisp yes. You can do it later if it's too much work but would be nice to have. |
@teo-tsirpanis I tossed a simple example in one our channels yesterday, I will DM you that. It would be better for @davisp to add an official 'blessed' example or two as the API changed a little and got 'richer'. I may be behind the curve. |
/** | ||
* Retrieves an attribute's enumeration given the attribute name (key). | ||
* | ||
* **Example:** | ||
* | ||
* The following retrieves the first attribute in the schema. | ||
* | ||
* @code{.c} | ||
* tiledb_attribute_t* attr; | ||
* tiledb_array_schema_get_enumeration( | ||
* ctx, array_schema, "attr_0", &enumeration); | ||
* // Make sure to delete the retrieved attribute in the end. | ||
* @endcode | ||
* | ||
* @param ctx The TileDB context. | ||
* @param array The TileDB array. | ||
* @param name The name (key) of the attribute from which to | ||
* retrieve the enumeration. | ||
* @param enumeration The enumeration object to retrieve. | ||
* @return `TILEDB_OK` for success and `TILEDB_ERR` for error. | ||
*/ | ||
TILEDB_EXPORT capi_return_t tiledb_array_get_enumeration( | ||
tiledb_ctx_t* ctx, | ||
const tiledb_array_t* array, | ||
const char* name, | ||
tiledb_enumeration_t** enumeration) TILEDB_NOEXCEPT; | ||
|
||
/** | ||
* Load all enumerations for the array. | ||
* | ||
* **Example:** | ||
* | ||
* @code{.c} | ||
* tiledb_array_load_all_enumerations(ctx, array); | ||
* @endcode | ||
* | ||
* @param ctx The TileDB context. | ||
* @param array The TileDB array. | ||
* @param latest_only If non-zero, only load enumerations for the latest schema. | ||
* @return `TILEDB_OK` for success and `TILEDB_ERR` for error. | ||
*/ | ||
TILEDB_EXPORT capi_return_t tiledb_array_load_all_enumerations( | ||
tiledb_ctx_t* ctx, | ||
const tiledb_array_t* array, | ||
int latest_only) TILEDB_NOEXCEPT; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these functions part of the array API? Shouldn't they belong to array schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ArraySchema has no access to the ContextResources/ArrayDirectory used for actually loading things from disk so I added it to the Array instead of forcing users to do something along the lines of schema->load_enumeration(array->array_directory())
or some such.
@teo-tsirpanis I've added an example here: befe066 Let me know if you'd like to add anything else to it. |
Thanks! |
|
||
#include <iostream> | ||
|
||
TEST_CASE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason these tests can't live in tiledb/api/capi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all testing API's from the Attribute, ArraySchema, and Array classes which aren't yet migrated to the new API subdirectory. Accidentally replied in the wrong spot yesterday, so this comment is moved.
7a226f2
to
5e509ba
Compare
18ac027
to
3c93077
Compare
This PR adds the Enumerated data types. Enumerated data types work by adding an Enumeration to the ArraySchema, setting an enumeration name on an attribute, and then adding the attribute to the ArraySchema. An Enumeration object contains a short list of options and a vector of values. An attribute that has an enumeration name set must have an integral type that is wide enough to index all of the enumerated values. Changes to the values of an enumeration (any of adding, renaming, or removing) can be accomplished via ArraySchemaEvolution.
3c93077
to
1bd816f
Compare
…ce with standalone link policy PR #4051 took object library `generic_tile_io` out of conformance with the policy that each OL should link standalone. This PR corrects this. Note: In its present state this PR is not suitable for review or merge. It's branched from a branch that itself has not merged yet and needs to be rebased before review is feasible.
…ne link policy PR #4051 took object library `generic_tile_io` out of conformance with the policy that each OL should link standalone. This PR corrects this.
[SC-51428](https://app.shortcut.com/tiledb-inc/story/51428/enumeration-path-map-does-not-exist-in-the-array-schema-format-spec) I noticed that the array schema format specification does not include the enumeration name-path map introduced in #4051. This PR updates the documentation. I used the term "enumeration filename" to describe the string written after the enumeration name because [it is just the file's name](https://github.com/TileDB-Inc/TileDB/blob/78ac1d2ec338fd468eb63481e85049215908e39f/tiledb/sm/array/array_directory.cc#L1324-L1326), and updated previous usages of "enumeration pathname" or "enumeration URI" in code. --- TYPE: NO_HISTORY DESC: Added documentation for the enumeration path map in array scehmas, present since format version 20.
This PR adds the Enumerated data types. Enumerated data types work by
adding an Enumeration to the ArraySchema, setting an enumeration name on
an attribute, and then adding the attribute to the ArraySchema.
An Enumeration object contains a short list of options and a vector of
values. An attribute that has an enumeration name set must have an
integral type that is wide enough to index all of the enumerated values.
Changes to the values of an enumeration (any of adding, renaming, or
removing) can be accomplished via ArraySchemaEvolution.
TYPE: FEATURE
DESC: Enumerated data types